Systems, methods, and software for classifying text from judicial opinions and other documents

ABSTRACT

To reduce cost and improve accuracy, the inventors devised systems, methods, and software to aid classification of text, such as headnotes and other documents, to target classes in a target classification system. For example, one system computes composite scores based on: similarity of input text to text assigned to each of the target classes; similarity of non-target classes assigned to the input text and target classes; probability of a target class given a set of one or more non-target classes assigned to the input text; and/or probability of the input text given text assigned to the target classes. The exemplary system then evaluates the composite scores using class-specific decision criteria, such as thresholds, ultimately assigning or recommending assignment of the input text to one or more of the target classes. The exemplary system is particularly suitable for classification systems having thousands of classes.

RELATED APPLICATION

The present application is a continuation of U.S. application Ser. No.11/215,715, which was filed on Aug. 30, 2005; which is a continuation ofU.S. application Ser. No. 10/027,914, which was filed on Dec. 21, 2001,now U.S. Pat. No. 7,062,498, issued on Jun. 13, 2006; which claimspriority to U.S. Provisional Application 60/336,862, which was filed onNov. 2, 2001; each of which is incorporated herein by reference in itsentirety.

COPYRIGHT NOTICE AND PERMISSION

A portion of this patent document contains material subject to copyrightprotection. The copyright owner has no objection to the facsimilereproduction by anyone of the patent document or the patent disclosure,as it appears in the Patent and Trademark Office patent files orrecords, but otherwise reserves all copyrights whatsoever. The followingnotice applies to this document: Copyright © 2001, West Group.

TECHNICAL FIELD

The present invention concerns systems, methods, and software forclassifying text and documents, such as headnotes of judicial opinions.

BACKGROUND

The American legal system, as well as some other legal systems aroundthe world, relies heavily on written judicial opinions—the writtenpronouncements of judges—to articulate or interpret the laws governingresolution of disputes. Each judicial opinion is not only important toresolving a particular legal dispute, but also to resolving similardisputes in the future. Because of this, judges and lawyers within ourlegal system are continually researching an ever-expanding body of pastopinions, or case law, for the ones most relevant to resolution of newdisputes.

To facilitate these searches, companies, such as West Publishing Companyof St. Paul, Minn. (doing business as West Group), not only collect andpublish the judicial opinions of courts across the United States, butalso summarize and classify the opinions based on the principles orpoints of law they contain. West Group, for example, creates andclassifies headnotes—short summaries of points made in judicialopinions—using its proprietary West Key Number™ System. (West Key Numberis a trademark of West Group.)

The West Key Number System is a hierarchical classification of over 20million headnotes across more than 90,000 distinctive legal categories,or classes. Each class has not only a descriptive name, but also aunique alpha-numeric code, known as its Key Number classification.

In addition to highly-detailed classification systems, such as the WestKey Number System, judges and lawyers conduct research using products,such as American Law Reports (ALR), that provide in-depth scholarlyanalysis of a broad spectrum of legal issues. In fact, the ALR includesabout 14,000 distinct articles, known as annotations, each teachingabout a separate legal issue, such as double jeopardy and free speech.Each annotations also include citations and/or headnotes identifyingrelevant judicial opinions to facilitate further legal research.

To ensure their currency as legal-research tools, the ALR annotationsare continually updated to cite recent judicial opinions (or cases).However, updating is a costly task given that courts across the countrycollectively issue hundreds of new opinions every day and that theconventional technique for identifying which of these cases are goodcandidates for citation is inefficient and inaccurate.

In particular, the conventional technique entails selecting cases thathave headnotes in certain classes of the West Key Number System ascandidates for citations in corresponding annotations. The candidatecases are then sent to professional editors for manual review and finaldetermination of which should be cited to the corresponding annotations.Unfortunately, this simplistic mapping of classes to annotations notonly sends many irrelevant cases to the editors, but also fails to sendmany that are relevant, both increasing the workload of the editors andlimiting accuracy of the updated annotations.

Accordingly, there is a need for tools that facilitate classification orassignment of judicial opinions to ALR annotations and other legalresearch tools.

SUMMARY OF EXEMPLARY EMBODIMENTS

To address this and other needs, the present inventors devised systems,methods, and software that facilitate classification of text ordocuments according to a target classification system. For instance, oneexemplary system aids in classifying headnotes to the ALR annotations;another aids in classifying headnotes to sections of AmericanJurisprudence (another encyclopedic style legal reference); and yetanother aids in classifying headnotes to the West Key Number System.However, these and other embodiments are applicable to classification ofother types of documents, such as emails.

More particularly, some of the exemplary systems classify or aid manualclassification of an input text by determining a set of compositescores, with each composite score corresponding to a respective targetclass in the target classification system. Determining each compositescore entails computing and and applying class-specific weights to atleast two of the following types of scores:

a first type based on similarity of the input text to text associatedwith a respective one of the target classes;

a second type based on similarity of a set of non-target classesassociated with the input text and a set of non-target classesassociated with a respective one of the target classes;

a third type based on probability of one of the target classes given aset of one or more non-target classes associated with the input text;and

a fourth type based on a probability of the input text given textassociated with a respective one of the target classes.

These exemplary systems then evaluate the composite scores usingclass-specific decision criteria, such as thresholds, to ultimatelyassign or recommend assignment of the input text (or a document or otherdata structure associated with the input text) to one or more of thetarget classes.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram of an exemplary classification system 100 embodyingteachings of the invention, including a unique graphical user interface114;

FIG. 2 is a flowchart illustrating an exemplary method embodied inclassification system 100 of FIG. 1;

FIG. 3 is a diagram of an exemplary headnote 310 and a correspondingnoun-word-pair model 320.

FIG. 4 is a facsimile of an exemplary graphical user interface 400 thatforms a portion of classification system 100.

FIG. 5 is a diagram of another exemplary classification system 500,which is similar to system 100 but includes additional classifiers; and

FIG. 6 is a diagram of another exemplary classification system 600,which is similar to system 100 but omits some classifiers.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

This description, which references and incorporates the above-identifiedFigures, describes one or more specific embodiments of one or moreinventions. These embodiments, offered not to limit but only toexemplify and teach the one or more inventions, are shown and describedin sufficient detail to enable those skilled in the art to implement orpractice the invention. Thus, where appropriate to avoid obscuring theinvention, the description may omit certain information known to thoseof skill in the art.

The description includes many terms with meanings derived from theirusage in the art or from their use within the context of thedescription. However, as a further aid, the following exemplarydefinitions are presented.

-   -   The term “document” refers to any addressable collection or        arrangement of machine-readable data.    -   The term “database” includes any logical collection or        arrangement of documents.    -   The term “headnote” refers to an electronic textual summary or        abstract concerning a point of law within a written judicial        opinion. The number of headnotes associated with a judicial        opinion (or case) depends on the number of issues it addresses.

Exemplary System for Classifying Headnotes to American Legal Reports

FIG. 1 shows a diagram of an exemplary document classification system100 for automatically classifying or recommending classifications ofelectronic documents according to a document classification scheme. Theexemplary embodiment classifies or recommends classification of cases,case citations, or associated headnotes, to one or more of thecategories represented by 13,779 ALR annotations. (The total number ofannotation is growing at a rate on the order of 20-30 annotations permonth.) However, the present invention is not limited to any particulartype of documents or type of classification system.

Though the exemplary embodiment is presented as an interconnectedensemble of separate components, some other embodiments implement theirfunctionality using a greater or lesser number of components. Moreover,some embodiments intercouple one or more the components through a local-or wide-area network. (Some embodiments implement one or more portionsof system 100 using one or more mainframe computers or servers.) Thus,the present invention is not limited to any particular functionalpartition.

System 100 includes an ALR annotation database 110, a headnotes database120, and a classification processor 130, a preliminary classificationdatabase 140, and editorial workstations 150.

ALR annotation database 110 (more generally a database of electronicdocuments classified according to a target classification scheme)includes a set of 13,779 annotations, which are presented generally byannotation 112. The exemplary embodiment regards each annotation as aclass or category. Each annotation, such as annotation 112, includes aset of one or more case citations, such as citations 112.1 and 112.2.

Each citation identifies or is associated with at least one judicialopinion (or generally an electronic document), such as electronicjudicial opinion (or case) 115. Judicial opinion 115 includes and/or isassociated with one or more headnotes in headnote database 120, such asheadnotes 122 and 124. (In the exemplary embodiment, a typical judicialopinion or case has about 6 associated headnotes, although cases having50 or more are not rare.)

A sample headnote and its assigned West Key Number class identifier areshown below.

Exemplary Headnote:

In an action brought under Administrative Procedure Act (APA), inquiryis twofold: court first examines the organic statute to determinewhether Congress intended that an aggrieved party follow a particularadministrative route before judicial relief would become available; ifthat generative statute is silent, court then asks whether an agency'sregulations require recourse to a superior agency authority.

Exemplary Key Number Class Identifier:

15AK229—ADMINISTRATIVE LAW AND PROCEDURE—SEPARATION OF ADMINISTRATIVEAND OTHER POWERS—JUDICIAL POWERS

In database 120, each headnote is associated with one or more classidentifiers, which are based, for example, on the West Key NumberClassification System. (For further details on the West Key NumberSystem, see West's Analysis of American Law: Guide to the AmericanDigest System, 2000 Edition, West Group, 1999, which is incorporatedherein by reference.) For example, headnote 122 is associated withclasses or class identifiers 122.1, 122.2, and 122.3, and headnote 124is associated with classes or class identifiers 124.1 and 124.2.

In the exemplary system, headnote database 120 includes about 20 millionheadnotes and grows at an approximate rate of 12,000 headnotes per week.About 89% of the headnotes are associated with a single classidentifier, about 10% with two class identifiers, and about 1% with morethan two class identifiers.

Additionally, headnote database 120 includes a number of headnotes, suchas headnotes 126 and 128, that are not yet assigned or associated withan ALR annotation in database 110. The headnotes, however, areassociated with class identifiers. Specifically, headnote 126 isassociated with class identifiers 126.1 and 126.2, and headnote 128 isassociated with class identifier 128.1.

Coupled to both ALR annotation database 110 and headnote database 120 isclassification processor 130. Classification processor 130 includesclassifiers 131, 132, 133, and 134, a composite-score generator 135, anassignment decision-maker 136, and decision-criteria module 137.Processor 130 determines whether one or more cases associated withheadnotes in headnote database 120 should be assigned to or cited withinone or more of the annotations of annotation database 110. Processor 130is also coupled to preliminary classification database 140.

Preliminary classification database 140 stores and/or organizes theassignment or citation recommendations. Within database 140, therecommendations can be organized as a single first-in-first-out (FIFO)queue, as multiple FIFO queues based on single annotations or subsets ofannotations. The recommendations are ultimately distributed to workcenter 150.

Work center 150 communicates with preliminary classification database140 as well as annotation database 110 and ultimately assists users inmanually updating the ALR annotations in database 110 based on therecommendations stored in database 140. Specifically, work center 150includes workstations 152, 154, and 156. Workstation 152, which issubstantially identical to workstations 154 and 156, includes agraphical-user interface 152.1, and user-interface devices, such as akeyboard and mouse (not shown.)

In general, exemplary system 100 operates as follows. Headnotes database120 receives a new set of headnotes (such as headnotes 126 and 128) forrecently decided cases, and classification processor 130 determineswhether one or more of the cases associated with the headnotes aresufficiently relevant to any of the annotations within ALR to justifyrecommending assignments of the headnotes (or associated cases) to oneor more of the annotations. (Some other embodiments directly assign theheadnotes or associated cases to the annotations.) The assignmentrecommendations are stored in preliminary classification database 140and later retrieved by or presented to editors in work center 150 viagraphical-user interfaces in workstations 152, 154, and 156 foracceptance or rejection. Accepted recommendations are added as citationsto the respective annotations in ALR annotation database 110 andrejected recommendations are not. However, both accepted and rejectedrecommendations are fed back to classification processor 130 forincremental training or tuning of its decision criteria.

More particularly, FIG. 2 shows a flow chart 200 illustrating in greaterdetail an exemplary method of operating system 100. Flow chart 200includes a number of process blocks 210-250. Though arranged serially inthe exemplary embodiment, other embodiments may reorder the blocks,omits one or more blocks, and/or execute two or more blocks in parallelusing multiple processors or a single processor organized as two or morevirtual machines or subprocessors. Moreover, still other embodimentsimplement the blocks as one or more specific interconnected hardware orintegrated-circuit modules with related control and data signalscommunicated between and through the modules. Thus, the exemplaryprocess flow is applicable to software, firmware, hardware, and hybridimplementations.

The remainder of the description uses the following notational system.The lower case letters a, h, and k respectively denote an annotation, aheadnote, and a class or class identifier, such as a West Key Numberclass or class identifier. The upper case letters A, H, and Krespectively denote the set of all annotations, the set of allheadnotes, and the set of all key numbers classifications. Additionally,variables denoting vector quantities are in bold-faced capital letters,and elements of the corresponding vectors are denoted in lower caseletters. For example, V denotes a vector, and v denotes an element ofvector V.

At block 210, the exemplary method begins by representing theannotations in annotations database 110 (in FIG. 1) as text-basedfeature vectors. In particular, this entails representing eachannotation a as a one-column feature vector, V_(a), based on the nounand/or noun-word pairs occurring in headnotes for the cases cited withinthe annotation. (Other embodiments represent the headnotes as bigrams ornoun phrases.) Although it is possible to use all the headnotesassociated with the cases cited in the annotation, the exemplaryembodiment selects from the set of all headnotes associated with thecited cases those that are most relevant to the annotation beingrepresented. For each annotation, this entails building a feature vectorusing all the headnotes in all cases cited in the annotation andselecting from each case one, two, or three headnotes based onsimilarity between the headnotes in a cited case and those of the citingannotation and denoting the most similar headnote(s) as relevant. Todetermine the most relevant headnotes, the exemplary embodiment usesclassifiers 131-134 to compute similarity scores, averages the fourscores for each headnote, and defines as most relevant the highestscoring headnote plus those with a score of at least 80% of the highestscore. The 80% value was chosen empirically.

Once selected, the associated headnotes (or alternatively the actualtext of the annotations) are represented as a set of nouns, noun-noun,noun-verb, and noun-adjective pairs that it contains. Words in aword-pair are not necessarily adjacent, but are within a specific numberof words or characters of each other, that is, within a particular wordor character window. The window size is adjustable and can take valuesfrom 1 to the total number of words or characters in the headnote.Although larger windows tend to yield better performance, in theexemplary embodiment, no change in performance was observed for windowslarger than 32 non-stop words. For convenience, however, the exemplarywindow size is set to the actual headnote size. The exemplary embodimentexcludes stop words and uses the root form of all words. Appendix Ashows an exemplary list of exemplary stopwords; however, otherembodiments use other lists of stopwords.

FIG. 3 shows an example of a headnote 310 and a noun-word representation320 in accord with the exemplary embodiment. Also shown are West KeyNumber classification text 330 and class identifier 340.

In a particular annotation vector V_(a), the weight, or magnitude, ofany particular element v_(a) is defined as

v _(a) =tf _(a) ^(′) *idf _(a) ^(′),  (1)

where tf_(a) ^(′) denotes the term frequency (that is, the total numberof occurrences) of the term or noun-word pair associated with annotationa. (In the exemplary embodiment, this is the number of occurrences ofthe term within the set of headnotes associated with the annotation.)idf_(a) ^(′) 0 denotes the inverse document frequency for the associatedterm or noun-word pair. idf_(a) ^(′) is defined as

$\begin{matrix}{{{idf}_{a}^{\prime} = {\log \left( \frac{N}{{df}_{a}^{\prime}} \right)}},} & (2)\end{matrix}$

where N is the total number of headnotes (for example, 20 million) inthe collection, and df_(a) ^(′) 0 is the number of headnotes (or moregenerally documents) containing the term or noun-word pair. The prime ′notation indicates that these frequency parameters are based on proxytext, for example, the text of associated headnotes, as opposed to textof the annotation itself. (However, other embodiments may use all orportions of text from the annotation alone or in combination with proxytext, such as headnotes or other related documents.)

Even though the exemplary embodiment uses headnotes associated with anannotation as opposed to text of the annotation itself, theannotation-text vectors can include a large number of elements. Indeed,some annotation vectors can include hundreds of thousands of terms ornoun-word pairs, with the majority of them having a low term frequency.Thus, not only to reduce the number of terms to a manageable number, butalso to avoid the rare-word problem known to exist in vector-spacemodels, the exemplary embodiment removes low-weight terms.

Specifically, the exemplary embodiment removes as many low-weight termsas necessary to achieve a lower absolute bound of 500 terms or a 75%reduction in the length of each annotation vector. The effect of thisprocess on the number of terms in an annotation vector depends on theirweight distribution. For example, if the terms have similar weights,approximately 75% of the terms will be removed. However, for annotationswith skewed weight distributions, as few as 10% of the terms might beremoved. In the exemplary embodiment, this process decreased the totalnumber of unique terms for all annotation vectors from approximately 70million to approximately 8 million terms.

Some other embodiments use other methods to limit vector size. Forexample, some embodiments apply a fixed threshold on the number of termsper category, or on the term's frequency, document frequency, or weight.These methods are generally efficient when the underlying categories donot vary significantly in the feature space. Still other embodimentsperform feature selection based on measures, such as mutual information.These methods, however, are computationally expensive. The exemplarymethod attempts to strike a balance between these two ends.

Block 220, executed after representation of the annotations astext-based feature vectors, entails modeling one or more input headnotesfrom database 120 (in FIG. 1) as a set of corresponding headnote-textvectors. The input headnotes include headnotes that have been recentlyadded to headnote database 120 or that have otherwise not previouslybeen reviewed for relevance to the ALR annotations in database 110.

The exemplary embodiment represents each input headnote h as a vectorV_(h), with each element v_(h), like the elements of the annotationvectors, associated with a term or noun-word pair in the headnote. v_(h)is defined as

v _(h) =tf ^(h) *idf _(H),  (3)

where tf_(h) denotes the frequency (that is, the total number ofoccurrences) of the associated term or noun-word pair in the inputheadnote, and idf_(H) denotes the inverse document frequency of theassociated term or noun-word pair within all the headnotes.

At block 230, the exemplary method continues with operation ofclassification processor 130 (in FIG. 1). FIG. 2 shows that block 230itself comprises sub-process blocks 231-237.

Block 231, which represents operation of classifier 131, entailscomputing a set of similarity scores based on the similarity of text ineach input headnote text to the text associated with each annotation.Specifically, the exemplary embodiment measures this similarity as thecosine of the angle between the headnote vector V_(h) and eachannotation vector V_(a). Mathematically, this is expressed as

$\begin{matrix}{{S_{1} = {{\cos \; \theta_{ah}} = \frac{V_{a}^{\prime} \cdot V_{h}^{\prime}}{{V_{a}} \times {V_{h}}}}},} & (4)\end{matrix}$

where “^(.)” denotes the conventional dot- or inner-product operator,and V_(a) ^(′) and V_(h) ^(′) denote that respective vectors V_(a) andV_(h) have been modified to include elements corresponding to terms ornoun-word pairs found in both the annotation text and the headnote. Inother words, the dot product is computed based on the intersection ofthe terms or noun-word pairs. ∥X∥ denotes the length of the vectorargument. In this embodiment, the magnitudes are computed based on allthe elements of the vector.

Block 232, which represents operation of classifier 132, entailsdetermining a set of similarity scores based on the similarity of theclass identifiers (or other meta-data) associated with the inputheadnote and those associated with each of the annotations. Before thisdetermination is made, each annotation a is represented as anannotation-class vector V_(a) ^(C) vector, with each element v_(a) ^(C)indicating the weight of a class identifier assigned to the headnotescited by the annotation. Each element v_(a) ^(C) is defined as

v _(a) ^(C) =tf _(a) ^(C) *idf _(a) ^(C),  (5)

where tf_(a) ^(C) denotes the frequency of the associated classidentifier, and idf_(a) ^(C), denotes its inverse document frequency.idf_(a) ^(C) is defined as

$\begin{matrix}{{{idf}_{a}^{C} = {\log \left( \frac{N_{C}}{{df}^{\; C}} \right)}},} & (6)\end{matrix}$

where N_(C) is the total number of classes or class identifiers. In theexemplary embodiment, N_(C) is 91997, the total number of classes in theWest Key Number System. df^(C) is the frequency of the class identifieramongst the set of class identifiers for annotation a. Unlike theexemplary annotation-text vectors which are based on a selected set ofannotation headnotes, the annotation-class vectors use all the classidentifiers associated with all the headnotes that are associated withthe annotation. Some embodiments may use class-identifier pairs,although they were found to be counterproductive in the exemplaryimplementation.

Similarly, each input headnote is also represented as a headnote-classvector V_(h) ^(C), with each element indicating the weight of a class orclass identifier assigned to the headnote. Each element v_(h) ^(C) isdefined as

v _(h) ^(C) =tf _(h) ^(C) *idf _(h) ^(C),  (7)

with tf_(h) ^(C) denoting the frequency of the class identifier, andidf_(h) ^(C) denoting the inverse document frequency of the classidentifier. idf_(h) ^(C) is defined as

$\begin{matrix}{{{idf}_{h}^{C} = {\log \left( \frac{N_{C}}{{df}_{a}^{C}} \right)}},} & (8)\end{matrix}$

where N_(C) is the total number of classes or class identifiers anddf_(h) is the frequency of the class or class identifier amongst the setof class or class identifiers associated with the annotation.

Once the annotation-class and headnote-class vectors are established,classification processor 130 computes each similarity score S₂ as thecosine of the angle between them. This is expressed as

$\begin{matrix}{S_{2} = {{\cos \; \theta_{ah}} = \frac{V_{a}^{C} \cdot V_{h}^{C}}{{V_{a}^{C}} \times {V_{h}^{C}}}}} & (9)\end{matrix}$

For headnotes that have more than one associated class identifier, theexemplary embodiment considers each class identifier separately of theothers for that headnote, ultimately using the one yielding the maximumclass-identifier similarity. The maximization criteria is used because,in some instances, a headnote may have two or more associated classidentifiers (or Key Number classifications), indicating its discussionof two or more legal points. However, in most cases, only one of theclass identifiers is relevant to a given annotation.

In block 233, classifier 133 determines a set of similarity scores S₃based on the probability that a headnote is associated with a givenannotation from class-identifier (or other meta-data) statistics. Thisprobability is approximated by

$\begin{matrix}{S_{3} = {{P\left( {ha} \right)} = {{P\left( {\left\{ k \right\}_{h}a} \right)} = {\max\limits_{k^{\prime} \in {\{ k\}}_{h}}\left( {{P\left( {k^{\prime}a} \right)},} \right.}}}} & (10)\end{matrix}$

where {k}_(h) denotes the set of class identifiers assigned to headnoteh. Each annotation conditional class probability P(k/a) is estimated by

$\begin{matrix}{{P\left( {ka} \right)} = \frac{1 + {tf}_{({k,a})}}{{a} + {\sum\limits_{k^{\prime} \in a}\; {tf}_{({k^{\prime},a})}}}} & (11)\end{matrix}$

where tf_((k,a)) the term frequency of the k-th class identifier amongthe class identifiers associated with the headnotes of annotation a; |a|denotes the total number of unique class identifiers associated withannotation a (that is, the number of samples or cardinality of the set);and

$\sum\limits_{k^{\prime} \in a}\; {tf}_{({k^{\prime},a})}$

denotes the sum of the term frequencies for all the class identifiers.

The exemplary determination of similarity scores S₃ relies onassumptions that class identifiers are assigned to a headnoteindependently of each other, and that only one class identifier in{k}_(h) is actually relevant to annotation a. Although the one-classassumption does not hold for many annotations, it improves the overallperformance of the system.

Alternatively, one can multiply the conditional class-identifier (KeyNumber classifications) probabilities for the annotation, but thiseffectively penalizes headnotes with multiple Key Number classifications(class assignments), compared to those with single Key Numberclassifications. Some other embodiments use Bayes' rule to incorporate apriori probabilities into classifier 133. However, some experimentationwith this approach suggests that system performance is likely to beinferior to that provided in this exemplary implementation.

The inferiority may stem from the fact that annotations are created atdifferent times, and the fact that one annotation has more citationsthan another does not necessarily mean it is more probable to occur fora given headnote. Indeed, a greater number of citations might onlyreflect that one annotation has been in existence longer and/or updatedmore often than another. Thus, other embodiments might use the priorprobabilities based on the frequency that class numbers are assigned tothe annotations.

In block 234, classifier 134 determines a set of similarity scores S₄,based on P(a|h), the probability of each annotation given the text ofthe input headnote. In deriving a practical expression for computingP(a|h), the exemplary embodiment first assumes that an input headnote his completely represented by a set of descriptors T, with eachdescriptor t assigned to a headnote with some probability, P(t|h). Then,based on the theory of total probability and Bayes' theorem, P(a|h) isexpressed as

$\begin{matrix}\begin{matrix}{{P\left( {ah} \right)} = {\sum\limits_{t \in T}\; {{P\left( {{ah},t} \right)}{P\left( {th} \right)}}}} \\{= {\sum\limits_{t \in T}\; {\frac{{P\left( {{ha},t} \right)}{P\left( {at} \right)}}{P\left( {h/t} \right)}{{P\left( {th} \right)}.}}}}\end{matrix} & (12)\end{matrix}$

Assuming that a descriptor is independent of the class identifiersassociated with a headnote allows one to make the approximation:

P(h|a,t)≈P(h|t)  (13)

and to compute the similarity scores S₄ according to

$\begin{matrix}{S_{4} = {{P\left( {ah} \right)} = {\sum\limits_{t \in T}\; {{P\left( {th} \right)}{P\left( {at} \right)}}}}} & (14)\end{matrix}$

where P(t|h) is approximated by

$\begin{matrix}{{P\left( {th} \right)} = {\frac{{tf}_{({t,h})}}{\sum\limits_{t^{\prime} \in T}\; {tf}_{({t^{\prime},h})}}.}} & (15)\end{matrix}$

tf_(t,h)) denotes the frequency of term t in the headnote and

$\sum\limits_{t^{\prime} \in T}\; {tf}_{({t^{\prime},h})}$

denotes the sum of the frequencies of all terms in the headnote. P(a|t)is defined according to Bayes' theorem as

$\begin{matrix}{{{P\left( {at} \right)} = \frac{{P\left( {ta} \right)}{P(a)}}{\sum\limits_{a^{\prime} \in A}{{P\left( {ta^{\prime}} \right)}{P\left( a^{\prime} \right)}}}},} & (16)\end{matrix}$

where P(a) denotes the prior probability for annotation a, and P(t|a),the probability of a discriminator t given annotation a, is estimated as

$\begin{matrix}{{{P\left( {ta} \right)} \cong {\frac{1}{a}{\sum\limits_{h \in a}\; {P\left( {th} \right)}}}},} & (17)\end{matrix}$

and

$\sum\limits_{a^{\prime} \in A}$

denotes summatuion over all annotations a^(′) in the set of annotationsA. Since all the annotation prior probabilities P(a) and P(a^(′)) areassumed to be equal, P(a|t) is computed using

$\begin{matrix}{{{P\left( {at} \right)} = \frac{P\left( {ta} \right)}{\sum\limits_{a^{\prime} \in A}{P\left( {ta^{\prime}} \right)}}},} & (18)\end{matrix}$

Block 235, which represents operation of composite-score generator 135,entails computing a set of composite similarity scores CS_(a) ^(h) basedon the sets of similarity scores determined at blocks 231-235 byclassifiers 131-135, with each composite score indicating the similarityof the input headnote h to each annotation a. More particularly,generator 135 computes each composite score CS_(a) ^(h) according to

$\begin{matrix}{{{CS}_{a}^{h} = {\sum\limits_{i = 1}^{4}\; {w_{ia}S_{a,i}^{h}}}},} & (19)\end{matrix}$

where S_(a,i) ^(h) denotes the similarity score of the i-th similarityscore generator for the input headnote h and annotation a, and w_(ia) isa weight assigned to the i-th similarity score generator and annotationa. Thus, w_(ia) is a weight specific to the i-th similarity scoregenerator and to annotation (or class) a. Execution of the exemplarymethod then continues at block 236.

At block 236, assignment decision-maker 136 recommends that the inputheadnote or a document, such as a case, associated with the headnote beclassified or incorporated into one or more of the annotations based onthe set of composite scores and decision criteria withindecision-criteria module 137. In the exemplary embodiments, the headnoteis assigned to annotations according to the following decision rule:

If CS_(a) ^(h)>Γ_(a), then recommend assignment of h or D_(h) toannotation a,  (20)

where Γ_(a) is an annotation-specific threshold from decision-criteriamodule 137 and D_(h) denotes a document, such as a legal opinion,associated with the headnote. (In the exemplary embodiment, each ALRannotation includes the text of associated headnotes and its full casecitation.)

The annotation-classifier weights w_(ia), for i=1 to 4, a∈A, and theannotation thresholds Γ_(a), a∈A, are learned during a tuning phase. Theweights, 0≦w_(ia)≦1, reflect system confidence in the ability of eachsimilarity score to route to annotation a. Similarly, the annotationthresholds Γ_(a), a∈A, are also learned and reflect the homogeneity ofan annotation. In general, annotations dealing with narrow topics tendto have higher thresholds than those dealing with multiple relatedtopics.

In this ALR embodiment, the thresholds reflect that, over 90% of theheadnotes (or associated documents) are not assigned to any annotations.Specifically, the exemplary embodiment estimates optimalannotation-classifier weights and annotation thresholds throughexhaustive search over a five-dimensional space. The space isdiscretized to make the search manageable. The optimal weights are thosecorresponding to maximum precision at recall levels of at least 90%.

More precisely, this entails trying every combination of four weightvariables, and for each combination, trying 20 possible threshold valuesover the interval [0,1]. The combination of weights and threshold thatyields the best precision and recall is then selected. The exemplaryembodiment excludes any weight-threshold combinations resulting in lessthan 90% recall.

To achieve higher precision levels, the exemplary embodiment effectivelyrequires assignments to compete for their assigned annotations or targetclassifications. This competition entails use of the following rule:

Assign h to a, iff CS_(a) ^(h)>aŜ  (21)

where a denotes an empirically determined value greater than zero andless than 1, for example, 0.8; Ŝ denotes the maximum compositesimilarity score associated with a headnote in {H_(a)}, the set ofheadnotes assigned to annotation a.

Block 240 entails processing classification recommendations fromclassification processor 130. To this end, processor 130 transfersclassification recommendations to preliminary classification database140 (shown in FIG. 1). Database 140 sorts the recommendation based onannotation, jurisdiction, or other relevant criteria and stores them in,for example, a single first-in-first-out (FIFO) queue, as multiple FIFOqueue based on single annotations or subsets of annotations.

One or more of the recommendations are then communicated by request orautomatically to workcenter 150, specifically workstations 152, 154, and156. Each of the workstations displays, automatically or in response touser activation, one or more graphical-user interfaces, such asgraphical-user interface 152.1.

FIG. 4 shows an exemplary form of graphical-user interface 152.1.Interface 152.1 includes concurrently displayed windows or regions 410,420, 430 and buttons 440-490.

Window 410 displays a recommendation list 412 of headnote identifiersfrom preliminary classification database 140. Each headnote identifieris logically associated with at least one annotation identifier (shownin window 430). Each of the listed headnote identifiers is selectableusing a selection device, such as a keyboard or mouse or microphone. Aheadnote identifier 412.1 in list 412 is automatically highlighted, byfor example, reverse-video presentation, upon selection. In response,window 420 displays a headnote 422 and a case citation 424, both ofwhich are associated with each other and the highlighted headnoteidentifier 412.1. In further response, window 430 displays at least aportion or section of an annotation outline 432 (or classificationhierarchy), associated with the annotation designated by the annotationidentifier associated with headnote 412.1.

Button 440, labeled “New Section,” allows a user to create a new sectionor subsection in the annotation outline. This feature is useful, sincein some instances, a headnote suggestion is good, but does not fit anexisting section of the annotation. Creating the new section orsubsection thus allows for convenient expansion of the annotation.

Button 450 toggles on and off the display of a text box describingheadnote assignments made to the current annotation during the currentsession. In the exemplary embodiment, the text box presents eachassignment in a short textual form, such as <annotation or classidentifier><subsection or section identifier ><headnote identifier>.This feature is particularly convenient for larger annotation outlinesthat exceed the size of window 430 and require scrolling contents of thewindow.

Button 460, labeled “Un-Allocate,” allows a user to de-assign, ordeclassify, a headnote to a particular annotation. Thus, if a userchanges her mind regarding a previous, unsaved, classification, the usercan nullify the classification. In some embodiments, headnotesidentified in window 410 are understood to be assigned to the particularannotation section displayed in window 430 unless the user decides thatthe assignment is incorrect or inappropriate. (In some embodiments,acceptance of a recommendation entails automatic creation of hyperlinkslinking the annotation to the case and the case to the annotation.)

Button 470, labeled “Next Annotation,” allows a user to cause display ofthe set of headnotes recommended for assignment to the next annotation.Specifically, this entails not only retrieving headnotes frompreliminary classification storage 140 and displaying them in window410, but also displaying the relevant annotation outline within window430.

Button 480, labeled “Skip Anno,” allows a user to skip the currentannotation and its suggestions altogether and advance to the next set ofsuggestions and associated annotation. This feature is particularlyuseful when an editor wants another editor to review assignments to aparticular annotation, or if the editor wants to review this annotationat another time, for example, after reading or studying the entireannotation text, for example. The suggestions remain in preliminaryclassification database 140 until they are either reviewed or removed.(In some embodiments, the suggestions are time-stamped and may besupplanted with more current suggestions or deleted automatically aftera preset period of time, with the time period, in some variationsdependent on the particular annotation.)

Button 490, labeled “Exit,” allows an editor to terminate an editorialsession. Upon termination, acceptances and recommendations are stored inALR annotations database 110.

FIG. 2 shows that after processing of the preliminary classifications,execution of the exemplary method continues at block 250. Block 250entails updating of classification decision criteria. In the exemplaryembodiment, this entails counting the numbers of accepted and rejectedclassification recommendations for each annotation, and adjusting theannotation-specific decision thresholds and/or classifier weightsappropriately. For example, if 80% of the classification recommendationsfor a given annotation are rejected during one day, week, month, quarteror year, the exemplary embodiment may increase the decision thresholdassociated with that annotation to reduce the number of recommendations.Conversely, if 80% are accepted, the threshold may be lowered to ensurethat a sufficient number of recommendations are being considered.

Exemplary System for Classifying Headnotes to American Jurisprudence

FIG. 5 shows a variation of system 100 in the form of an exemplaryclassification system 500 tailored to facilitate classification ofdocuments to one or more of the 135,000 sections of The AmericanJurisprudence (AmJur). Similar to an ALR annotation, each AmJur sectioncites relevant cases as they are decided by the courts. Likewise,updating AmJur is time consuming.

In comparison to system 100, classification system 500 includes sixclassifiers: classifiers 131-134 and classifiers 510 and 520, acomposite score generator 530, and assignment decision-maker 540.Classifiers 131-134 are identical to the ones used in system 100, withthe exception that they operate on AmJur data as opposed to ALR data.

Classifiers 510 and 520 process AmJur section text itself, instead ofproxy text based on headnotes cited within the AmJur section. Morespecifically, classifier 510 operates using the formulae underlyingclassifier 131 to generate similarity measurements based on the tf-idfs(term-frequency-inverse document frequency) of noun-word pairs in AmJursection text. And classifier 520 operates using the formulae underlyingclassifier 134 to generate similarity measurements based on theprobabilities of a section text given the input headnote.

Once the measurements are computed, each classifier assigns each AmJursection a similarity score based on a numerical ranking of itsrespective set of similarity measurements. Thus, for any input headnote,each of the six classifiers effectively ranks the 135,000 AmJur sectionsaccording to their similarities to the headnote. Given the differencesin the classifiers and the data underlying their scores, it is unlikelythat all six classifiers would rank the most relevant AmJur section thehighest; differences in the classifiers and the data they use generallysuggest that this will not occur. Table 1 shows a partial ranked listingof AmJur sections showing how each classifier scored, or ranked, theirsimilarity to a given headnote.

TABLE 1 Partial Ranked Listing AmJur Sections based of Median of SixSimilarity Scores C1 C2 C3 C4 C5 C6 Section Ranks Ranks Ranks RanksRanks Ranks Median Ranks Section_1 1 8 4 1 3 2 2.5 Section_2 3 2 5 9 1 33 Section_3 2 4 6 5 4 4 4 Section_4 5 1 3 8 6 1 4 Section_5 7 3 2 2 5 54 Section_6 4 5 1 7 2 9 4.5 Section_7 8 7 8 4 7 6 7 Section_8 6 9 7 3 107 7 Section_9 9 10 9 6 9 10 9 Section_10 10 6 10 10 8 8 9

Composite score generator 530 generates a composite similarity score foreach AmJur section based on its corresponding set of six similarityscores. In the exemplary embodiment, this entails computing the medianof the six scores for each AmJur section. However, other embodiments cancompute a uniform or non-uniformly weighted average of all six or asubset of the six rankings Still other embodiments can select themaximum, minimum, or mode as the composite score for the AmJur section.After generating the composite scores, the composite score generatorforwards data identifying the AmJur section associated with the highestcomposite score, the highest composite score, and the input headnote toassignment decision-maker 540.

Assignment decision-maker 540 provides a fixed portion ofheadnote-classification recommendations to preliminary classificationdatabase 140, based on the total number of input headnotes per a fixedtime period. The fixed number and time period governing the number ofrecommendations are determined according to parameters withindecision-criteria module 137. For example, one embodiment ranks allincoming headnotes for the time period, based on their composite scoresand recommends only those headnotes that rank in the top 16 percent.

In some instances, more than one headnote may have a composite scorethat equals a given cut-off threshold, such as top 16%. To ensuregreater accuracy in these circumstances, the exemplary embodimentre-orders all headnote-section pairs that coincide with the cut-offthreshold, using the six actual classifier scores.

This entails converting the six classifier scores for a particularheadnote-section pair into six Z-scores and then multiplying the sixZ-scores for a particular headnote-section pair to produce a singlesimilarity measure. (Z-scores are obtained by assuming that eachclassifier score has a normal distribution, estimating the mean andstandard deviation of the distribution, and then subtracting the meanfrom the classifier score and dividing the result by the standarddeviation.) The headnote-section pairs that meet the acceptance criteriaare than re-ordered, or re-ranked, according to this new similaritymeasure, with as many as needed to achieve the desired number of totalrecommendations being forwarded to preliminary classification database140. (Other embodiments may apply this “reordering” to all of theheadnote-section pairs and then filter these based on the acceptancecriteria necessary to obtain the desired number of recommendations.)

Exemplary System for Classifying Headnotes to West Key Number System

FIG. 6 shows another variation of system 100 in the form of an exemplaryclassification system 600 tailored to facilitate classification of inputheadnotes to classes of the West Key Number System. The Key NumberSystem is a hierarchical classification system with 450 top-levelclasses, which are further subdivided into 92,000 sub-classes, eachhaving a unique class identifier. In comparison to system 100, system600 includes classifiers 131 and 134, a composite score generator 610,and an assignment decision-maker 620.

In accord with previous embodiments, classifiers 131 and 134 model eachinput headnote as a feature vector of noun-word pairs and each classidentifier as a feature vector of noun-word pairs extracted fromheadnotes assigned to it. Classifier 131 generates similarity scoresbased on the tf-idf products for noun-word pairs in headnotes assignedto each class identifier and to a given input headnote. And classifier134 generates similarity scores based on the probabilities of a classidentifier given the input headnote. Thus, system 600 generates over184,000 similarity scores, with each scores representing the similarityof the input headnote to a respective one of the over 92,000 classidentifiers in the West Key Number System using a respective one of thetwo classifiers.

Composite score generator 610 combines the two similarity measures foreach possible headnote-class-identifier pair to generate a respectivecomposite similarity score. In the exemplary embodiment, this entailsdefining, for each class or class identifier, two normalized cumulativehistograms (one for each classifier) based on the headnotes alreadyassigned to the class. These histograms approximate correspondingcumulative density functions, allowing one to determine the probabilitythat a given percentage of the class identifiers scored below a certainsimilarity score.

More particularly, the two cumulative normalized histograms forclass-identifier c, based on classifiers 131 and 134 are respectivelydenoted F_(c) ¹ and F_(c) ², and estimated according to:

$\begin{matrix}{{{F_{C}^{1}(s)} = {{F_{C}^{1}\left( {s - 0.01} \right)} + {\frac{1}{M_{C}}*{\left\{ {{h_{i}S_{i}^{1}} = s} \right\} }}}}{and}} & (22) \\{{{F_{C}^{2}(s)} = {{F_{C}^{2}\left( {s - 0.01} \right)} + {\frac{1}{M_{C}}*{\left\{ {{h_{i}S_{i}^{2}} = s} \right\} }}}},} & (23)\end{matrix}$

where c denotes a particular class or class identifier; s=0, 0.01, 0.02,0.03, . . . , 1.0; F(s<0)=0; M_(c) denotes the number of headnotesclassified to or associated with class or class identifier c; |{B}|denotes the number of elements in the set B h_(i), i=1, . . . ,M_(c)denotes the set of headnotes already classified or associated with classor class identifier c; S_(i) ¹ denotes the similarity score for headnoteh_(i) and class-identifier c, as measured by classifier 131, and S_(i) ²denote the similarity score for headnote h_(i) and class-identifier c,as measured by classifier 134. (In this context, each similarity scoreindicates the similarity of a given assigned headnote to all theheadnotes assigned to class c.) In other words, |{h_(i)|S_(i) ¹=s}|denotes the number of headnotes assigned to class c that received ascore of s from classifier 131, and |{h_(i)|S_(i) ²=s}| denotes thenumber of headnotes assigned to class c that received a score of s fromclassifier 134.

Thus, for every possible score value (between 0 and 1 with a particularscore spacing), each histogram provides the percentage of assignedheadnotes that scored higher and lower than that particular score. Forexample, for classifier 131, the histogram for class identifier c mightshow that 60% of the set of headnotes assigned to classifier c scoredhigher than 0.7 when compared to the set of headnotes as a whole;whereas for classifier 134 the histogram might show that 50% of theassigned headnotes scored higher than 0.7

Next, composite score generator 610 converts each score for the inputheadnote into a normalized similarity score using the correspondinghistogram and computes each composite score for each class based on thenormalized scores. In the exemplary embodiment, this conversion entailsmapping each classifier score to the corresponding histogram todetermine its cumulative probability and then multiplying the cumulativeprobabilities of respective pairs of scores associated with a givenclass c to compute the respective composite similarity score. The set ofcomposite scores for the input headnote are then processed by assignmentdecisionmaker 620.

Assignment decision maker 620 forwards a fixed number of the top scoringclass identifiers to preliminary classification database 140. Theexemplary embodiments suggest the class identifiers having the top fivecomposite similarity scores for every input headnote.

Other Exemplary Applications

The components of the various exemplary systems presented can becombined in myriad ways to form other classification systems of bothgreater and lesser complexity. Additionally, the components and systemscan be tailored for other types of documents other than headnotes.Indeed, the components and systems and embodied teachings and principlesof operation are relevant to virtually any text or data classificationcontext.

For example, one can apply one or more of the exemplary systems andrelated variations to classify electronic voice and mail messages. Somemail classifying systems may include one or more classifiers incombination with conventional rules which classify messages as useful orSPAM based on whether the sender is in your address book, same domain asrecipient, etc.

APPENDIX A Exemplary Stop Words

-   a a.m ab about above accordingly across ad after afterward    afterwards again against ago ah ahead ain't all allows almost alone    along already alright also although always am among amongst an and    and/or anew another ante any anybody anybody's anyhow anymore anyone    anyone's anything anything's anytime anytime's anyway anyways    anywhere anywhere's anywise appear approx are aren't around as aside    associated at available away awfully awhile b banc be became because    become becomes becoming been before beforehand behalf behind being    below beside besides best better between beyond both brief but by    bythe c came can can't cannot cant cause causes certain certainly    cetera cf ch change changes cit cl clearly cmt co concerning    consequently consider contain containing contains contra    corresponding could couldn't course curiam currently d day days dba    de des described di did didn't different divers do does doesn't    doing don't done down downward downwards dr du during e e.g each ed    eds eg eight eighteen eighty either eleven else elsewhere enough    especially et etc even ever evermore every everybody everybody's    everyone everyone's everyplace everything everything's everywhere    everywhere's example except f facie facto far few fewer fide fides    followed following follows for forma former formerly forth forthwith    fortiori fro from further furthermore g get gets getting given gives    go goes going gone got gotten h had hadn't happens hardly has hasn't    have haven't having he he'd he'll he's hello hence henceforth her    here here's hereabout hereabouts hereafter herebefore hereby herein    hereinafter hereinbefore hereinbelow hereof hereto heretofore    hereunder hereunto hereupon herewith hers herself hey hi him himself    his hither hitherto hoc hon how howbeit however howsoever hundred i    i'd i'll i'm i've i.e ibid ibidem id ie if ignored ii iii illus    immediate in inasmuch inc indeed indicate indicated indicates infra    initio insofar instead inthe into intra inward ipsa is isn't it it's    its itself iv ix j jr judicata just k keep kept kinda know known    knows l la last later latter latterly le least les less lest let    let's like likewise little looks ltd m ma'am many may maybe me    meantime meanwhile mero might million more moreover most mostly motu    mr mrs ms much must my myself name namely naught near necessary    neither never nevermore nevertheless new next no no-one nobody nohow    nolo nom non none nonetheless noone nor normally nos not nothing    novo now nowhere o o'clock of ofa off ofhis oft often ofthe ofthis    oh on once one one's ones oneself only onthe onto op or other others    otherwise ought our ours ourself ourselves out outside over overall    overly own p p.m p.s par para paras pars particular particularly    passim per peradventure percent perchance perforce perhaps pg pgs    placed please plus possible pp probably provides q quite r rata    rather really rel relatively rem res resp respectively right s sa    said same says se sec seem seemed seeming seems seen sent serious    several shall shalt she she'll she's should shouldn't since sir so    some somebody somebody's somehow someone someone's something    something's sometime sometimes somewhat somewhere somewhere's    specified specify specifying still such sundry sup t take taken tam    than that that's thats the their theirs them themselves then thence    thenceforth thenceforward there there's thereafter thereby therefor    therefore therefrom therein thereof thereon theres thereto    theretofore thereunto thereupon therewith these they they'll thing    things third this thither thorough thoroughly those though three    through throughout thru thus to to-wit together too toward towards u    uh unless until up upon upward upwards used useful using usually v    v.s value various very vi via vii viii virtually vs w was wasn't way    we we'd we'll we're we've well went were weren't what what'll what's    whatever whatsoever when whence whenever where whereafter whereas    whereat whereby wherefore wherefrom wherein whereinto whereof    whereon wheresoever whereto whereunder whereunto whereupon wherever    wherewith whether which whichever while whither who who'd who'll    who's whoever whole wholly wholy whom whose why will with within    without won't would wouldn't x y y'all ya'll ye yeah yes yet you    you'll you're you've your yours yourself yourselves z

CONCLUSION

In furtherance of the art, the inventors have presented variousexemplary systems, methods, and software which facilitate theclassification of text, such as headnotes or associated legal cases to aclassification system, such as that represented by the nearly 14,000 ALRannotations. The exemplary system classifies or makes classificationrecommendations based on text and class similarities and probabilisticrelations. The system also provides a graphical-user interface tofacilitate editorial processing of recommended classifications and thusautomated update of document collections, such as the American LegalReports, American Jurisprudence, and countless others.

The embodiments described above are intended only to illustrate andteach one or more ways of practicing or implementing the presentinvention, not to restrict its breadth or scope. The actual scope of theinvention, which embraces all ways of practicing or implementing theteachings of the invention, is defined only by the following claims andtheir equivalents.

1. An automated method of classifying input text according to a targetclassification system having two or more target classes, the methodcomprising: for each target class, determining a composite score basedon a first score scaled by a first class-specific weight for the targetclass and a second score scaled by a second class-specific weight forthe target class, with the first and second scores based on an inputtext and text associated with the target class; and for each targetclass, classifying or recommending classification of the input text tothe target class based on the composite score and a class-specificdecision threshold for the target class.
 2. The method of claim 1,wherein the first and second scores are based on at least one of: ascore based on similarity of at least one or more portions of the inputtext to text associated with the target class; a score based onsimilarity of a set of one or more non-target classes associated withthe input text and a set of one or more non-target classes associatedwith the target class; a score based on probability of the target classgiven a set of one or more non-target classes associated with the inputtext; and a score based on probability of the target class given atleast a portion of the input text.
 3. The method of claim 1, furthercomprising: updating the class-specific threshold for one of the targetclasses based on acceptance or rejection of recommended classificationsof the input text.
 4. An automated method of classifying text to one ormore target classes in a target classification system, the methodcomprising: identifying one or more noun-word pairs in a portion oftext; and determining one or more scores based on frequencies of one ormore of the identified noun-word pairs in the portion of text and one ormore noun-word pairs in text associated with one of the target classes.5. The method of claim 4, wherein identifying one or more noun-wordpairs in the portion of text comprises: identifying a first noun in theportion of text; and identifying one or more words within apredetermined number of words of the first noun.
 6. The method of claim5, wherein identifying one or more words within a predetermined numberof words of the first noun comprises excluding a set of one or more stopwords.
 7. The method of claim 4, wherein the portion of text is aparagraph.
 8. The method of claim 4, wherein the one or more scoresinclude: at least one score based on similarity of at least one or moreportions of the input text to text associated with the target class; atleast one score based on similarity of a set of one or more non-targetclasses associated with the input text and a set of one or morenon-target classes associated with the target class; at least one scorebased on probability of the target class given a set of one or morenon-target classes associated with the input text; and at least onescore based on probability of the target class given at least a portionof the input text.
 9. The method of claim 4, wherein determining one ormore scores based on one or more identified noun-word pairs and one ormore noun-word pairs in other text associated with one of the targetclasses, comprises: determining a respective weight for each identifiednoun-word pair, with the respective weight based on a product of a termfrequency of the identified word-noun pair in the text and an inversedocument frequency of the noun-word pairs in the other text associatedwith one of the target classes.
 10. An automated method of classifyinginput text to one or more target classes in a target classificationsystem, the method comprising: identifying a first set of noun-wordpairs in the input text, with the first set including at least onenoun-word pair formed from a noun and non-adjacent word in the inputtext; identifying two or more second sets of noun-word pairs, with eachsecond set including at least one noun-word pair formed from a noun andnon-adjacent word in text associated with a respective one of the targetclasses; determining a set of scores based on the first and second setsof noun-word pairs; and classifying or recommending classification ofthe input text to one or more of the target classes based on the set ofscores.
 11. A system for classifying input text to a targetclassification system having two or more target classes, the systemcomprising: a scoring module for determining for each of the targetclasses at least first and second scores based on the input text and thetarget class; a composite scoring module for determining for each of thetarget classes a corresponding composite score based on the first scorescaled by a first class-specific weight for the target class and thesecond score scaled by a second class-specific weight for the targetclass; and a classification module for determining for each of thetarget classes whether to classify or recommend classification of theinput text to the target class based on the corresponding compositescore and a class-specific decision threshold for the target class. 12.The system of claim 11: wherein the scoring module comprises means fordetermining for each of the target classes at least first and secondscores based on the input text and the target class; wherein the acomposite scoring module comprises means for determining for each of thetarget classes a corresponding composite score based on the first scorescaled by a first class-specific weight for the target class and thesecond score scaled by a second class-specific weight for the targetclass; and wherein the classification module comprises means fordetermining for each of the target classes whether to classify orrecommend classification of the input text to the target class based onthe corresponding composite score and a class-specific decisionthreshold for the target class.
 13. A machine-readable medium comprisinginstructions related to classifying input text to a targetclassification system having two or more target classes, theinstructions comprising: a first set of instructions for determiningfirst and second scores based on the input text and one of the targetclasses, wherein the first score is based on: similarity of at least oneor more portions of the input text to text associated with the onetarget class; or similarity of a set of one or more non-target classesassociated with the input text and a set of one or more non-targetclasses associated with the one target class; and wherein the secondscore is based on: probability of the one target class given a set ofone or more non-target classes associated with the input text; orprobability of the one target class given at least a portion of theinput text; a second set of instructions for determining a compositescore based on the first and second scores; and a third set ofinstructions for comparing the composite score to a decision threshold.14. The medium of claim 13, wherein the second set of instructions fordetermining the composite score based on the first and second scorescomprises instructions for weighting the first and second scores byrespective first and second class-specific weights associated with theone target class; and adding the weighted first score to the secondweighted scores.
 15. The medium of claim 13, wherein the first score isbased on a set of one or more noun-word pairs associated with the inputtext and a set of one or more noun-word pairs associated with the onetarget class, with at least one noun-word pair in each set including anoun and a non-adjacent word.
 16. The medium of claim 15, wherein thenoun and the non-adjacent word are no more than 32 words apart,excluding stop words.
 17. The computer readable medium of claim 13,wherein each target class is a document and the text associated with theone target class comprises text of the document or text of anotherdocument associated with the target class.
 18. A machine-readable mediumcomprising instructions for classifying input text to a targetclassification system having two or more target classes, theinstructions comprising: a first set of instructions for determiningfirst and second scores based on the input text and one of the targetclasses, wherein the first score is based on similarity of a set of oneor more non-target classes associated with the input text and a set ofone or more non-target classes associated with the one target class; andwherein the second score is based on probability of the one target classgiven at least a portion of the input text; a second set of instructionsfor determining a composite score based on a linear combination of thefirst and second scores; and a third set of instructions for comparingthe composite score to a decision threshold.
 19. The medium of claim 18,wherein the first score is based on a set of one or more noun-word pairsassociated with the input text and a set of one or more noun-word pairsassociated with the one target class, with at least one noun-word pairin each set including a noun and a non-adjacent word.
 20. The medium ofclaim 19, wherein the noun and the non-adjacent word are no more than 32words apart, excluding stop words.
 21. The medium of claim 18, whereineach target class is a document and the text associated with the onetarget class comprises text of the document or text of another documentassociated with the target class.